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ABSTRACT 

In recent years, the number of studies investigating possible non- 
invasive health screening techniques for infants have increased ex¬ 
ponentially. Amongst those, one of the most prominent is health 
screening based on the acoustic investigation of infant cry. Clini¬ 
cians involved in the field moved from visual inspection of the audi¬ 
ble spectrum to automatized analysis of cry samples using computer 
software. A software that has been more widely adopted in recent 
years is Praat, a free software designed for speech analysis. Unfor¬ 
tunately, the software’s default settings are not suitable for investi¬ 
gation of cry samples, yet rarely used settings are reported in final 
manuscripts. In this article, we tested 4 different computer gener¬ 
ated signals, with frequency features comparable to cry frequencies, 
and 3 real cry samples using both Praat’s standards and tuned set¬ 
tings. Our results highlight the importance of properly tuning soft¬ 
ware’s parameters when expanding their field of usage, and provide 
a starting point for the development of optimal Praat algorithm’s pa¬ 
rameters selection for cry analysis. 

1. INTRODUCTION 

Screening of infants’ health statuses can lead to early recognition 
of developmental pathologies, this allows clinicians to define an in¬ 
tervention program, which can lead to enhanced outcomes when 
adopted in earlier stages of life. Among infants’ health screening 
methods, non-invasive techniques received the highest level of atten¬ 
tion within the community of pediatricians and researchers. Starting 
from the second half of the Twentieth Century, researchers inves¬ 
tigated several possible ways to identify different pathologies and 
developmental issues through non-invasive methods. 

For example, pulse oximetry, a non invasive technique that mea¬ 
sures the amount of oxygenated and deoxygenated hemoglobin in 
blood by mean of infrared light, has been widely tested for early 
screening of congenital heart defects in asymptomatic newborn ba¬ 
bies |T][2]|3][4|[5]. Recently, in a review by Thangaratinam et al. 0, 
authors compared the overall sensitivity of this method and false¬ 
positive ratio against other screening techniques, including prena¬ 
tal ultrasounds and routine physical exams @0. [what are the 
results?] One of the techniques in which researchers" interest in¬ 


creased exponentially during the last sixty years is the empirical 
analysis of infant cry HID- Acoustical properties of infant cry 
have been associated with different developmental pathologies, in¬ 
cluding Autism Spectrum Disorders (ASD), Sudden Infant Death 
Syndrome (SIDS), hearing impairments and unilateral cleft lip and 
palate (UCLP) flOl UTl [T2l . 

1.1. Properties of Infant cry 

Cry sound utterances are produced by the larynx during the expira¬ 
tory phase of respiration. Pressure differences of air streams flow¬ 
ing through the larynx cause vocal folds to open and close rapidly, 
from about 250 to about 550 times per second in healthy infants 
nil mi nu eu H7i uu uu. This ratio of vibration is defined as 
fundamental frequency (Fo) 1201 1211 . Position of the vocal folds is 
modulated by central nervous system (CNS), and therefore activity 
of the vocal folds can be used to estimate an infant’s developmen¬ 
tal status. Moreover, the lower vocal tract produces different sound 
characteristics, including the loudness of the expiratory phase. 

The upper vocal folds concur instead in the production of higher fre¬ 
quencies, resonants of the fundamental frequency [22||23]. During 
the first two years of life, an infant’s body evolves. The vocal tract 
shapes during this period, and therefore acoustical properties of cry 
vocalizations changes accordingly. mmm. 

Research studies conducted on infants suffering from pathological 
conditions highlighted a positive shift in the spectrum of cry fre¬ 
quency properties, as compared to those of healthy infants. For ex¬ 
ample, investigation of infants at high risk of developing ASD disor¬ 
ders showed that the fundamental frequency of their cry vocalization 
can be higher than 700 Hz [27 j. Analogous, Fo collected from vocal¬ 
ization of infants suffering from colic were significantly higher than 
those collected from healthy infants (28). 


1.2. Cry analysis 

In a typical cry experiment, audio recordings are collected by in¬ 
ducing infants to cry using a specific paradigm, or trigger (e.g. pain 
caused heel prick test (29)). Collected samples are then preprocessed 
to increase the signal to noise ratio. 
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During the 1960s, when systematic analysis of infant cry began, re¬ 
searchers relied on visual inspection of spectrograms 13011311 . With 
the advent of more powerful computing devices, techniques and al¬ 
gorithms employed in cry analysis became more sophisticated, pro¬ 
ducing more accurate and useful results. 

Because of the similarities between infant vocalization and adult 
voice, cry researchers adopted software designed for speech analysis. 
One of the software most widely used within the field is Praat, a free 
software developed by Paul Boersma and David Weenink, specifi¬ 
cally designed for acoustic analysis of adult voice [32]. In the last 18 
years, Praat has been used in 41.3% (N=36) of the articles published 
within the field during this period (N=87), detailed information about 
the software in use is providedf8|[9]. Despite being a robust tool for 
speech analysis, Praat’s default parameters are not suitable for ac¬ 
curate analysis of cry samples. In this article we discuss the role 
of Praat in cry analysis, highlighting the reasons for which standard 
settings are not suitable and provide suggestions on how to apply it 
successfully on cry samples. 

2. PRAAT 

Praat features a graphical user interface that fits the needs of differ¬ 
ent researchers, from phoneticians to musicians and biologists in¬ 
volved in the acoustic analysis of animal vocalizations. Written in 
C and C++, Praat provides tools for analysis of signals’ pitch (Fo) 
and formants in audio signals. Not only that, Praat comes with a pic¬ 
ture tool which produces high-quality graphics ready to be used in 
manuscripts and dissertations. 

The software uses a general purpose scripting language that can be 
used to automatize the analysis of multiple files, allowing for fast 
processing of large amount of auditory samples [321. 

Praat implements an auto-correlation algorithm for pitch analysis. 
According to Boersma, the applied algorithms is not only more ac¬ 
curate than other frequency-based pitch detection procedures, but is 
also less dependent on the length of selected window and more re¬ 
sistant to rapid shifts and external noise ED- 

2.1. Praat settings 

In this work, settings have been verified on Praat version 6.0.43 (8 
September 2018), running on a Linux machine (Linux Mint 19 Tara 
x86_64, Kernel: 4.15.0-42-generic). 

2.1.1. Pitch 

Default pitch settings point the algorithm to search for FO in the fre¬ 
quency range that goes from 75Hz to 500Hz. As introduced above, 
healthy infant cry’s fundamental frequency usually lays between 250 
and 550 Hz, with the latter higher in sick infants. 

With those settings, there are at least two possible situations in which 
Praat cannot identify the real fundamental frequency value: 

• Fo is above the upper cutoff: In this situation, Praat will iden¬ 
tify a wrong value (lower) for the fundamental, or provide no 
pitch information within a window. 

• Fo lays between the cutoff values but a strong noise with a 
frequency between 75 and 250Hz is present. In the situation 
where a strong periodical noise is recorded within the signal, 
such as the presence of a split-system air-conditioner within 
the recording environment on. it is possible that the software 


identifies this lower frequency as the real fundamental, espe¬ 
cially when this noise is about half of the real fundamental 
frequency. 

2.1.2. Formants 

Standard formant settings are used to obtain up to 5 formants with a 
frequency lower than 5500Hz. The GUI returns n — 1 formants’ fre¬ 
quency values, where n is the number of formants indicated within 
the settings. 

3. ANALYSIS OF COMPUTER GENERATED SIGNALS 

To better illustrate pitch and formant extraction errors, we tested 
Praat with standard and cry suitable settings on a set of computer 
generated signals with a specific Fo, to which white noise was added. 
Formants (N=5), with a frequency of about Fo* (n+ 1) and decreas¬ 
ing amplitudes E3 p- 306] have been added to the generated signals. 
For half of the files, noise at a specific frequency band, close to Fq/2. 
was added. Four different signals of 5s length have been generated. 
Audio files and the source code written in Python which were used 
to generate those signals are available online M Used frequency val¬ 
ues for Fo, formants and, where added, Fo/2, are reported in Table 
|T] (Real). To verify the validity of generated files, a visual inspec¬ 
tion of the spectrum was conducted. Frequency peaks are shown in 
parenthesis in Table]!] 

Using Praat we extracted value of Pitch and Formants at t = 2.5s, 
using both Praat’s standard (Praat S.) and cry-optimized (Praat O.) 
settings: 

• Pitch: 

- Pitch range (Hz) = 250.0 - 800.0 Hz 

• Formants: 

- Maximum formants (Hz) = 4500.0 Hz 

Fundamental frequencies and formant have also been verified by vi¬ 
sual inspection of the spectrogram using Audacity version 2.2.1 and 
the following settings: 

• Algorithm: Spectrum 

• Window size: 1024 

• Function: Hanning window 

• Axis: Logarithmic frequency 

Pitch and formants frequency obtained using the two set of settings, 
and their Mean Absolute Percentage Error ( MAPE ) are reported in 
Table [I] 

4. DISCUSSION 

As described above and demonstrated by analysis on simple com¬ 
puter generated samples, Praat’s default settings are not suitable for 
the analysis of infant cry. In example A.wav, Fo is located between 
the pitch cutoff values and no periodic noise was added. We can ob¬ 
serve, that parameter optimization led to a general improvement of 
formant estimation, with the MAPE drastically reduced. 

Similarly, in example B.wav, where Fo was still between pitch cut¬ 
off values and periodic noise was above the lower cut-off with stan¬ 
dard settings but not parameters optimized, the latter configuration 

! https://github.com/ABPLab/Praat-LAC2 019 
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granted a better recognition of the fundamental as well as of the for¬ 
mant. 

In example C.wav, Fo was higher than the upper cut-off for the pitch 
of Praat’s standard settings. Here, pitch recognition identified the 
wrong peak as the signal pitch. This situation did not occur when pa¬ 
rameters were optimized and the higher cut-off was increased. This 
is especially important when working with pathological infants or 
where the risk of developmental pathology is high, and therefore 
acoustic properties of cry are expected to differ from those of healthy 
infants. 

Finally, as shown with file D.wav, when the presence of periodic 
noise was at about half of the fundamental frequency (with a high 
fundamental frequency), it led the software to a recognition error 
even with optimized parameters. This did not happen when the spec¬ 
trum was visually inspected, since it was clear that the amplitude of 
Fa/2 was lower than the amplitude of the peak of Fo, as visible in 
Figure [T] 

Parameter tuning sharpens extracted features, but because of the prop¬ 
erties of cry, researchers still have to pay special attention to obtained 
values, as well as to the quality of collected data. 

Generally, we can expect Praat with standard settings to perform 
poorly when employed in infant cry studies, because of the com¬ 
plexity of the signal itself and of the presence of external noise. In 
the next section, we will shows the performances of Praat on real cry 
sample, using both the standard and optimized settings. 


researchers used Praat, details about the used settings were provided 

HO- 



Frequency (Hz) 


Figure 1: Spectrum of D.wav, extracted using Audacity. Pitch (Fo), 
formant (Fj,F 2 ,F 3 ,F 4 ) and periodic noise (Fo/2) have been labelled 
accordingly. 


7. CONCLUSIONS 


5. ANALYSIS OF REAL CRY SIGNALS 

In order to provide a demonstration of Praat’s performance on real 
cry samples, we analyzed infant utterances from a public dataset 
OH- More specifically, we assessed the first three utterances from 
the file "BabyCrying2.wav", therefore named here as "Utterancel", 
"Utterance2" and "Utterance3". 

Fo and formants have been first obtained by visual inspection with 
Audacity, using the same configurations used to obtain the spectrum 
of computer generated signals. Because of the properties of cry, re¬ 
ported value are the mean values of a whole utterance. Frequencies’ 
peaks are reported in Table [2] Then, each utterance have been an¬ 
alyzed in Praat, using both the default settings ( Praat S.) and our 
suggested settings ( Praat O.). For each pair of file and settings, we 
estimated the Mean Absolute Percentage Error using as actual value 
the peak obtained manually in Audacity by visual inspection of the 
spectrum. Pitch and formants frequency values and MEAP per file 
and settings are reported in Table [2] 

6. DISCUSSION 

As shows in Table[2] the difference in the estimated MEAP of investi¬ 
gates samples follows what have been shown for computer generated 
signals in Table[I] Similarly to the previous examples, the higher the 
formant number, the higher the difference between the peak detected 
in Praat or by visual inspection. 

With an average reduction in the estimated MEAP of 18.4%, a fast 
optimization of pitch and formant detection parameters demonstrated 
to be helpful in increasing the accuracy of estimated features. As 
demonstrated by our examples, differences in the used settings can 
result in a large variance in estimated frequency values. Because of 
that, we expect researchers involved in cry studies to tune the soft¬ 
ware properly and to report used settings in final manuscripts. Unfor¬ 
tunately, this is not the case: only in 12 out of 36 studies in which the 


In this work, we demonstrated the different level of performance 
that Praat, an open source software designed for speech analysis, 
can achieve when used with infant cry samples when the parame¬ 
ters are or aren't tuned. In the first part of this work, we generated 
different acoustic signals with features similar to those of real cry 
samples. Generated files have been analyzed first by visual inspec¬ 
tion, then using Praat standard settings and finally by fine tuning the 
algorithms’ parameters. The performance of the sofware has been 
evaluated using the Mean Absolute Percentage Error (MAPE). In 
the second part of this work, we applied the same procedure to a 
set of real cry utterances. Our results show that Praat standard set¬ 
tings are not suitable for the analysis of cry signal, and therefore the 
software should not be employed in cry studies without tuning. Re¬ 
searchers have to carefully examine collected data, to ensure that no 
external sources of periodic noises are recorded within the signals. 
Furthermore, because of the high inter-individual variability of cry 
properties, it may be advisable to tune pitch and formant extraction 
settings according to the investigated participants and their health 
statuses. We advise researchers of the field to test Praat’s parameters 
with more complex and extreme cry sounds so as to identify the ex¬ 
tent to which the software can be correctly integrated in cry studies. 
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Table 1: Real and Praat’s estimated values for generated acoustic signal properties. For Praat estimation, standard (Praat S.) and optimized 
(Praat O.) settings were used and values where computed at t=2.5s. In parenthesis are values obtained by visual inspection of the spectrum 
generated with Audacity version 2.2.1. For each pair of file and settings, the Mean Absolute Percentage Error (MAPE) has been calculated. 
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been estimated using as Actual value the peak highlighted by Audacity trough visual inspection. 
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